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FROM THE SUBJECT’S POINT OF VIEW, WHEN IS 
BEHAVIOR PRIVATE AND WHEN IS IT PUBLIC: 


PROBLEMS OF INFERENCE 1 


MARTIN T. ORNE 2 


Institute of the Pennsylvania Hospital and University of Pennsylvania 


In an invited discussion, methodological questions concerning McFall’s study 
of the effect of self-monitoring on smoking behavior are raised. It is empha- 
sized that results of such studies should be evaluated from the point of view 
of the S rather than the investigator. Some measures assumed to be unobtru- 
sive by the investigator share qualities of deception experiments: It must be 
determined, therefore, whether it is the S or the Z who is deceived. Procedures 
that may be helpful in clarifying such questions and the difficulties of general- 
izing results to other contexts are discussed. 


A number of very significant and topical issues 
are raised by McFall (1970). The problem of 
criterion measures has long bedeviled all psycho- 
therapy research. Behaviorally oriented therapists 
have generally been in a better position since 
they were able to specify the behavior to be 
modified. As McFall’s study clearly demonstrates, 
some of the measures commonly utilized in evalu- 
ating the effects of behavioral interventions are 
in themselves altered by the very act of collect- 
ing data. Because self-reports that require an S 
to monitor his own behavior are easily obtained 
and are generally very sensitive to change, the 
technique has been widely utilized. However, if 
requiring an S to report his own behavior in and 
of itself modifies the behavior, care must be 
taken in accepting at face value findings based 
on such evidence. 

In an elegant way McFall reasons that if ask- 
ing an S to monitor his behavior will have an 
effect on the behavior under surveillance, it should 
make a difference what the S is asked to monitor. 
- He illustrates this general point by translation 
into a timely and important paradigm: It is 
argued that requiring an individual to report the 
number of cigarettes smoked may have an aver- 
sive conditioning effect, and asking him instead 
to indicate each time he successfully resists the 
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temptation to smoke should therefore reduce 
cigarette consumption. Empirically, his study 
demonstrates not only that asking Ss to note 
each time they smoke a cigarette will affect their 
actual smoking behavior, but he also shows a 
clear-cut difference in the amount smoked, de- 
pending on whether Ss were asked to record the 
frequency of the impulse to smoke or the number 
of cigarettes consumed. This finding—potentially 
of very great practical as well as theoretical im- 
portance—-well deserves further investigation. 
While the conclusions reached seem intuitively 
compelling, it is unclear to me whether, from the 
S’s point of view in this experiment, the behavior 
may have other alternative and instructive mean- 
ing. Because the present study not only deals 
with an extremely interesting problem, but also 
was carefully carried out and executed, it serves 
to illustrate major differences in the interpreta- 
tion of the data between analyzing the experiment 
from the viewpoint of the S rather than the O. 
These views would be obscured in a piece of re- 
search that was inferior and marred by obvious 
methodological flaws. However, because of basic 
issues of interpretation, of how the experiment 
might have been perceived by the S, I suspect 
that some of the findings in the study are unique 
to the particular experimental situation employed 
and will not generalize outside of the specific 
experimental setting. It should be clear that 
my comments are intended not as a critique 
of an interesting and worthwhile investigation, 
but rather to raise issues for future research, 
In the analysis to follow, I will make assump- 
tions about what I believe are plausible al- 
ternative explanations of the author’s findings. 
The data to support these assumptions are not 
for the most part available; nonetheless, it seems 
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fruitful to state these possibilities in a positive 
fashion which will lend itself to empirical testing. 

The observation that, asking a patient to moni- 
tor his own behavior has consequences for the 
behavior is generally well recognized by clinicians. 
A particularly good example is the common ob- 
servation of physicians working with obesity— 
when patients can be induced to keep an accurate 
diary of their food intake for the purpose of 
“establishing a base line,” a loss of weight is 
usually observed (A. J. Stunkard, personal com- 
munication, 1969). It would seem that the patient 
who is constrained by the procedure to admit his 
actual food intake both to himself and the thera- 
pist, tends to forego some of the items he would 
otherwise have consumed, Unfortunately it is 
difficult to persuade patients to keep an honest 
record of their food intake. 

The reactive aspects of keeping a food diary 
can be evaluated relatively easily because the pa- 
tient’s weight serves as a convenient and reliable 
measure of total caloric intake. The loss of 
weight, therefore, reflects a decrease in eating 
behavior. Unfortunately, no equally reliable way 
is available to test the individual’s overall ciga- 
rette consumption, yet such an overall measure 
would be of great importance in evaluating pro- 
cedures designed to curtail smoking. Therefore, 
McFall had to develop analogous information in 
a more complex fashion. 

In order to draw inference about the effect of 
instructions on smoking behavior in general, it 
was necessary to utilize detailed observations of 
this behavior during a specified period where ob- 
servational data could be obtained. The author 
uses the number of cigarettes smoked during 
classroom hours as an index of overall smoking 
behavior. Unfortunately classroom smoking is not 
necessarily a representative sample of smoking 
behavior in general. Some individuals tend to 
smoke primarily in situations of emotional stress, 
others when they are working, others when they 
are bored, still others in social situations when 
they need a prop, etc. The effectiveness of in- 
structions in modifying overall smoking behavior 
would not necessarily be reflected adequately by 
smoking that could be observed in the classroom. 
Beyond this general problem, however, there are 
more serious difficulties in generalizing from such 
data. 

The study was carried out using the investi- 
gator’s own students in a psychology class. The 
nonsmokers, as determined by a questionnaire, 
were asked to participate as Os to keep track of 
the behavior of the smoking individuals who 
would later be the Ss of the experiment. In order 
to do this, all students were required to fill in a 
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questionnaire about many aspects of their be- 
havior when the class first met; nonsmokers were 
then segregated (presumably on a totally un- 
related basis), asked to stay after class, and their 
cooperation as collaborators was elicited and ob- 
tained. These Os were each assigned to observe 
one smoking S seated close to them; it was their 
task to keep track of the number of cigarettes 
smoked—initially during a two-week period of 
actual base line prior to any mention of smoking 
behavior during the course, and later after the 
smoking members of the class had been asked to 
monitor their own smoking behavior and report 
it, At that time, half of the smokers were asked 
to monitor the number of cigarettes smoked, and 
the other half, the number of times they felt like 
smoking but did not. 

The purpose of eliciting cooperation from the 
nonsmoking Os is to test the reactivity of self- 
monitoring by the smoking Ss. It is, of course, 
essential that these individuals be unaware that 
their smoking behavior is being monitored, either 
during their base line or subsequently. In many 
ways, then, this experiment has the attributes of 
a deception study, in that it depends on the O's 
keeping accurate track of his target S’s smoking 
behavior without letting S know he is being ob- 
served. While only one smoking S admitted that 
his nonsmoking O had told him about the pro- 
cedure, it seems unlikely that this was limited to 
only one S. As Ss did not volunteer for their 
task, but rather were drafted into both the role 
of O and S, the likelihood that the investigator’s 
interest in smoking behavior was known on 
campus, and given the probability that Ss had 
contact with each other outside of class, it seems 
plausible that what I have described as a “pact 
of ignorance” (Orne, 1962) could have existed 
between the investigator and several of his Ss. 
The fact that it was necessary for the investigator 
to model smoking in order to obtain a sufficiently 
high base rate increased the odds that the smoking 
S recognized his role in the experiment. As in any 
study involving deception, it is essential to de- 
termine the extent to which it is the S$ or the £ 
who is deceived. (For a discussion of these issues, 
see Orne & Holland, 1968.) 

Among the findings reported, one of the more 
interesting is the discrepancy between the number 
of cigarettes reported smoked by the Ss, as com- 
pared with those reported by the Os of these Ss. 
Typically, Os reported fewer cigarettes smoked 
than did the Ss themselves who were instructed 
to report their smoking behavior. No self-report 
data about smoking are available for those Ss 
who had been instructed to report only the im- 
pulse to smoke. However, smoking a cigaretic 
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takes a certain period of time, and it is difficult 


"> to see how an O, given the relatively easy task of 


keeping track of one target S, could consistently 
fail to accurately count the number of cigarettes 
smoked, In trying to clarify this puazling observa- 
tion, McFall reports an item of data that was 
unintentionally collected concerning a day the 
class did not meet. Several of the Ss monitoring 
their own smoking behavior—though only one of 
the Os—reported smoking data for that particular 
day, supporting the belief that smoking Ss made 
up their self-report data at some later date, rather 
than noting down the number of cigarettes 
smoked each day as they had been instructed 
to do. 

Quite appropriately, McFall points to the need 
of independent checks on the validity of self- 
monitoring data and the importance of not con- 
founding self-report and task motivation. It 
seems, however, that the factors which may de- 
termine the data obtained in the reported study 
may be even more subtle and complex. Thus, 
when comparing the base-line performance with 
that during the experimental period, it was shown 
that there was a significant increase in smoking 
behavior among those Ss instructed to report 
the number of cigarettes smoked and an equally 
dramatic decrease among those Ss instructed to 
report the number of times they felt like smok- 
ing but did not. 

To understand what appears to have occurred 
in this study, it is essential to consider the ex- 
periment from the student-volunteer participant’s 
point of view. Ignoring for a moment what prob- 
ably were at least partially unsuccessful decep- 
tion aspects of the experiment, the student 
smoker finds himself in a class with a professor 
interested in smoking behavior. There is a 
“No Smoking” sign in the room. However, not 
only does the professor ignore the sign personally, 
but he also explains that it is relevant only 
to afternoon classes. Finally, the class is asked 
to participate in an experiment on smoking, being 
assigned to one of two groups. He is aware that 
in addition to Ss receiving his instructions, there 
are others receiving other instructions. As T 
have indicated, smoking is a conspicuous be- 
_havior, From the S’s point of view, whether he 
smokes or not is clearly obvious to his instructor, 
who is also the Z. While the investigator may feel 
that the task of keeping track of smoking be- 
havior of his 16 Ss is superhuman, no such as- 
sumption needs to be made by the Ss. The fact 

_is that the investigator himself could easily have 
kept track of two or even three Ss’ smoking be- 
havior on any given day without difficulty; in no 
way, therefore, ought one to assume that Ss con- 


sidered their smoking in this class as private be- 
havior. On the contrary, they likely knew the 
investigator’s interest, realized that they were 
participating in a study important to him, and 
undoubtedly surmised that he had hypotheses 
about what they would do. 

The actual smoking behavior of the Ss as re- 
ported supports such a conjecture. Thus, whereas 
self-reported smoking behavior would tend. to 
lead to a decreased base line in most instances, 
it was found here to have led to a dramatic in- 
crease in reported behavior. It is likely that the 
professor communicated, by his modeling be- 
havior, that he wanted smoking behavior and ex- 
pected it to be augmented. 

Such an intepretation is consonant with the 
even greater shift in the cigarette consumption of 
the no-smoke group. Again, it stands to reason 
that asking Ss to report whenever they felt like 
smoking but did not is an implicit request to de- 
crease one’s smoking. It is not surprising there- 
fore that Ss did so dramatically. Unfortunately, 
these findings are documented only for the public 
smoking behavior. Public, in this sense, is in- 
tended to mean smoking behavior in the sight of 
and with the knowledge of the professor. Smok- 
ing behavior had, of course, been designated as 
an experimentally relevant variable. Without de- 
tailed inquiry data, it is difficult to know much 
about the Ss’ motives and the reasons why the 
one group increased their smoking behavior above 
base line. The possibility that they expected a 
request to stop smoking at some future time or 
some kind of crossover design cannot be excluded. 
Expectational effects of this kind on “base-line 
performance” have been elegantly demonstrated 
by Zamansky, Scharf, and Brightbill (1964) in 
hypnotic research, 

A discussion of this kind when translated into 
experimental terms invariably emphasizes the 
need for additional control groups. Unfortunately, 
it is impossible to design an experiment where a 
colleague cannot think of additional groups that 
should have been run. The solution does not le 
in increasingly complex factorial designs, but 
rather in a recognition of the limitations inherent 
in any single study. In this instance I have em- 
phasized the public, nonprivate character of the 
period when smoking behavior was being re- 
corded, because I believe such public behavior 
could, for short periods, easily be modified. For 
example, if a class which included smokers were 
divided into two groups, half of which were asked 
by the instructor, “Please, for the sake of an 
experiment that is important, increase the amount 
of smoking you do over the next few lectures,” 
and the other half were asked, “Please, for the 
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sake of an experiment that is important, decrease 
the amount of smoking you do over the next few 
periods,” one might reasonably anticipate sig- 
nificant effects, Assuming that findings of this 
kind were easily obtainable, extreme caution 
would be essential in interpreting the data in 
McFall’s study, In other words, if we are ob- 
taining data under conditions where Ss can easily 
alter their performance without much cost to 
themselves, inference must be drawn exceedingly 
cautiously, To again use the example of obesity, 
it is one thing to demonstrate that a given diet 
results in modification of eating behavior when- 
ever the patient eats with his therapist, but it is 
quite another to demonstrate weight loss over a 
prolonged period (though even the latter could, 
of course, be affected by simple instructions in 
some instances). In the case of the former, such 
a finding would have little interest, while in the 
case of the latter, we would weigh it entirely 
differently, 

There are some situations when problems of 
compliance are relatively unimportant, such as 
when the test procedures involve stable maximal 
performances, as in the case of athletes. Thus, it 
would be of interest to demonstrate that instruc- 
tions can shorten the time required for an ex- 
perienced track man to cover a mile, whereas to 
show that instructions can lead to an increase in 
time would not necessarily have much signifi- 
cance. The same is true with studies involving 
endurance, memory, learning, etc.—all instances 
where reliable, stable base lines are obtained 
under motivated conditions. Even then, however, 
it would seem best to determine the ease with 
which such a base line could be modified by ask- 
ing an individual to do so. Whenever a situation 
exists where S can easily alter his performance in 
a given direction when asked to do so, caution is 
essential in interpreting similar data obtained 
from human Ss, even if the techniques employed 
appear different. 

One way in which some of these problems can 
be circumvented is to study what the S believes 
to be private behavior. Those measures that are 
believed to be nonreactive all share the quality 
that it is intended that S does not recognize our 
interest in them (see Webb, Campbell, Schwartz, 
& Sechrest, 1966), Under these circumstances, we 
may reasonably hope that he is unaware of the 
measure. However, most of these measures are 
nonreactive only to the extent that S fails to 
recognize our interest. They, therefore, share 
many of the qualities of deception experiments 
and require special procedures (which I have 
called quasi-controls: Orne, 1969) in order to 
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make certain whether S$ did or did not perceive 
the situation to involve private behavior. 

The study of the effects of self-monitoring on 
normal smoking behavior utilizes a nonrepresenta- 
tive public sample of smoking behavior in order 
to draw inference about how much the individual 
actually smokes, It seems likely to me that the 
findings were a function of the demand charac- 
teristics of the particular experimental situation 
and therefore do not allow more general infer- 
ence. Thus, the experimental data need not re- 
flect the Ss’ overall smoking behavior. Intuitively 
I am convinced the conclusions about potential 
effects of self-monitoring are correct, while I am 
equally certain, without the benefit of specific 
data, that the findings concerning the effects of 
differential instructions will not generalize be- 
yond the experimental situation.® 

The kind of demonstration that would be re- 
quired to validate the effect of instructional in- 
terventions in smoking behavior is extremely dif- 
ficult to obtain. The same is true in many other 
clinical situations, and probably because of these 
difficulties, many behavior therapists have chosen 
to ignore the problems inherent in self-reporting 
and use this measure as though it were fully valid. 
To the extent that one is dealing with patients 
who are paying for treatment, as opposed to ex- 
perimental Ss who are being paid, the motiva- 
tional factors may be given different weights. Not 
that patients’ self-reports are necessarily more 
accurate, but in the absence of a highly meaning- 
ful relationship, the individual who seeks help 
is more likely to be concerned about what the 
data mean to him than what they mean to the 
therapist-experimenter. However, the need for in- 
dependent evaluation of the findings remains. To 
the extent that the evaluation procedures are 
public and often of short duration, the possibility 
that changes obtained in them may not be repre- 
sentative of the patient’s overall behavior must 
be kept in mind. For example, the many snake 
phobia studies which follow the Lang and Lazovik 
(1963) model all use a behavioral snake-avoid- 
ance test. The possibility that this test is reac- 
tive to the implied wishes of the # cannot be 
excluded (D. A. Bernstein, personal communica- 
tion, 1969), Thus, Orne and Evans (1965) showed 
that unselected Ss could be induced to carry out 
apparently dangerous and self-destructive actions 
which neither other Ss nor colleagues thought 
likely. 


3 That is, the effect would not be materially dif- 
ferent from communicating to the S the E’s desire 
that he should increase or decrease his smoking be- 
havior. 
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There are no simple ways around the problems 
inherent in criterion measures of change. They 
will have to be evaluated in each instance. For- 
tunately, they become less serious when we are 
dealing with items of behavior that the S can- 
not or will not undertake prior to treatment. 
The minimum requirement would be to show 
that a simple request is not effective in eliciting 
the particular behavior at the onset. Further, 
when we are dealing with clear-cut end points, 
Ss’ reports tend to become more reliable. For 
example, a statement by a patient that he has 
decreased the number of cigarettes consumed is 
likely to be far less reliable than a statement that 
he has stopped smoking over an extended period 
of time. The latter report, in order to be inac- 
curate, would demand that the patient con- 
sciously lie, whereas the former statement could 
easily be subject to self-deception, 

In many instances where we are dealing with 
serious disturbances of functioning, it is possible 
to obtain evidence about the individual at work, 
in school, or similar situations where others can 
corroborate self-reports. Such reports by others 
are generally less responsive to subtle changes; 
they also tend to be more reliable. Here, again, 
reports of quantitatively greater changes are 
more likely to be trustworthy. It would seem, 
. then, that in order to study the effect of any 
- therapeutic technique, it would be best to use $ 
populations where Jarge and unequivocal changes 
beyond the individual’s voluntary capacity at the 
onset of treatment can be obtained. The use of 
college student volunteers not only makes in- 
ference about treatment tenuous, but also has 
limited scientific significance due to the general 
inadequacy of the criterion measures that are 
employed. The technique of using extreme cases, 
be they severe behavior pathology or other ex- 
amples of profound effects, not only has face 
validity and therapeutic significance, but also 
may be the most effective way of minimizing 
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those aspects of the experimental situation which. 
interfere with reliable and logically valid findings. 

To summarize, McFall’s basic observation that 
self-report measures may in themselves be reac- 
tive is likely to be true, and while I do not think 
that his other conclusions are likely to be gen- 
eralizable beyond the concrete experimental situ- 
ation, the study itself raises important issues. 
Hopefully, recognition of the possible distortions 
introduced by various criterion measures of 
change employed in evaluating treatment will lead 
to more sophisticated and less easily influenced 
criteria. 
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